“Calhoun 


Institutional Archive of the Naval Postgraduate School 





Calhoun: The NPS Institutional Archive 
DSpace Repository 


Theses and Dissertations 1. Thesis and Dissertation Collection, all items 


1992-03 


Fitting and prediction uncertainty fora 
software reliability model 


Dennison, Thomas E. 


Monterey, California. Naval Postgraduate School 
http://ndl.handle.net/10945/23678 


This publication is a work of the U.S. Government as defined in Title 17, United 
States Code, Section 101. Copyright protection is not available for this work in the 
United States. 


Downloaded from NPS Archive: Calhoun 


Calhoun is the Naval Postgraduate School's public access digital repository for 
| (8 D U DLEY research materials and institutional publications created by the NPS community. 
«ist sia Calhoun is named for Professor of Mathematics Guy K. Calhoun, NPS's first 


NY KNOX appointed — and published -- scholarly author. 

ia) LIBRARY Dudley Knox Library / Naval Postgraduate School 

411 Dyer Road / 1 University Circle 
Monterey, California USA 93943 





http://www.nps.edu/library 


hee 
sab cys ate F708 RR ee al 
= oes fetes ot 
» de @ HF - t 
a res one . 
parm tn tee at ff pate tay iets "OP ‘ “ 
sacotar eee eee Jo geypnare RbOk 1e r pe baatbi a ® we 2 Me } cz _ : “ 3 . 
SNaieaceners sy, wy, paras fi? * Sty os ie 5 a . t i i , 5 
Pe eakend Pb A fat te Ltt “ . « ae Fs at i a te . “5 e , 4 
ee abn rgirtate? rer 7 f 46 ; i. 
WEEP Le Sticker : rt g Ade Sw. * hs a} a ee 
me tog a osee tthe Soa rate po" Sos Pe d 7 web ws te ‘ ‘ ne wri 2 : 
ats ete Seaoren 9 tured ake pupa eat - a A, ee, ne a (am ema: . Lal 
aie fee sar melee oo js 4 4p Pe s K : brs ‘2h 
eroraniegeeahe teers : ef ; . ans oe co ae Sno oth 2a ely 4 
cafe oe ; : mee Ne ‘ 2 5 u vhs - i Sy 
ere % ‘ ¢ ipo og 2 FN Poet ta ; = a a : Pep Py TL oy iti as 
: pty : as eae, Page ort ae eine Canes 
‘i 5 re AP “4 . s , a 
Z, : : na ct “at WEES nar rag -} ‘ pate h at Nap ni gPat ae AAS 
: Yess a ra ’ ; achat BRAMALL eer 
I j 4 ae Me ateaets oeetataesentte : 
i me a A rf nis oot genta?! PAPO le cfedel oq eee 
ri y. Lassmatiets apee” piads7e aeryont severe 
: me - by *y oe ra here Pe " s ppc ou bima ree 
: heat’ Fost ene 
ri Ps y ot Sef ne 
3 , CPE ld 
an nate OTE 1 
a eupeeaeeer at rer 700 anal 
peere 4? rd 


FireOne 
( 5 4 ra eh ~ » 0 
ve ny ohh 
. ~ at hns® 4 
a * ernie Lat Se 
ba Te ae 
yar oe rah or 


aan Aerearsewt “i Za oaee' Ga dints ere 
aa Sonam er cere ses pee ae Get oe we 
e wearers ee eae OO aN ae 
Se ad ~ pee ae Pam ar wane : ‘ Crete ing 
er soc ate. pinay Oe PU ead saG eh vt 2 
aa nerte mee pT ed ~ ag . anrt® ‘ c 2 LAP ce iar Prana! 
qn acte weet C c 5s ; e 7 eertat- at phot f~ nee Mar Se, reat ory 
pane erate . 0 eth net 54 vas anevtgee =! Tene: Nea ee oP ree, GAs 2p, DPE bo 
et aoe eae ane an ees eM a opal tes zi e we Set OT ea [ea Fal pers’ Li fr vi het wre, Velo? ’ 
Se eee amare Satara ee E Rad eta x4 2 ie . ot aarp we? f 
7 . ! ; "dane ae os AF a, bar Cer . ' 5 as 
ent tiont Srecaemracerese ao yr oot d . . F pe bgt BSP POre / . Piet ri oT hat? vs % 
wt eens s Hatt GrSek wri ae weet, . Cet ad veal? Ry steel . 5 ‘ . F = “ 
202 Careers Stee a 4 4 : oe ae : . ub % Ps ey Oy , Lan ed 
ro Ot reene oot oe = ‘ . aie poe. wat ee! 2 Xx ¥ = 8 a . J Rang pene: 
rater ; , : Weta ~~" Wap acu eet a : ae 7 Pond 
5 7 . f att eT ta? » * ae ot Sy Avahe DASE ~~ : a re 
ar OP a . . m= 3" a Odd Met een? | aos ‘ = af a* “abet ‘ a Seale « Te 7 tS ay Stale 12 ee Pe Aas 
egies S591 ana ren re i a . t 4 Taree Patel pri A lee n we af ed > a. alee 2 gd aD 
eethey enc grasa tect ¢ Crane eerie E04 y ° 2 5 ° Ter payee” ye at? pede. Vln & ear arbor ; is hse . gyarrte te ag NRL Oe , 
Ae marian enpteeeene. yee Pr ere hed um > , “ ; . Pe we anee, = teary 9 <a C ; - Pry eae. tad Bdea WE ‘ate ERE LEE 
qerank sean ees © * os Sahar a een ane “ t r v r Y . 4 a, wale Re ee 5 2 y gett ; arr Btn "ah ei amb at hie . s arta etkee 
“€ . osu seraret ary . ; ’ E " : q 5% tio OY «gan Na K TAL SS 3 ‘sa as Sad Lae Ay 
rad tbe aS Sure 0. Caeryaties ¢ 1" u i fe : 7 . . ; Me Pi . pets nee Se ee i Do ee 2 
it =a 1 @ Gate to “ . ‘. . hy . es " — tere Aer 
Fanon te ean > warpesttes “gat ‘ . a : a lah ‘ sate Ne 0? me re soit a ¥ aS wares i . . - 
‘e) Chel = é 2 § 1 ee "2, t "es * ? oy ae . = 
Patient eat ae oo ioe arr a , : fo . a =f © tel Shh s aes Pe agoptah abate x ivakew : ¥ f, = PS, 
wets 3 : ; wawet rs ABNER Sag EO a ah te nist atere set bo 2 : sSfisSentets 
z * # : 2 ary 2 r s . P b ea 
f - , ‘. ae acy e P ne ee? ate te ae Sate MO ry ‘ rt aed “ 
o. . . <a, 7 : z & = eda ohn? aa ae e ‘ana 0 8, Rae ae tata 
! ,a ¢ , ws r a “ » 2 me pues are ae wt 
a pager F 4 . " 5 te cghe gaee C22 Pee fame 
. E . . ‘ * my gassts oS an aren ats s me j 
5b Doane onto tide inthe nana 
rae Arey BP . ; rf 2 Ean 
: ayia BY ge SS pee antand ‘ - . apeenekals 
’ ; c z seared ni . : 
aa apis: BARR Ate us . 
AGLZ : Tae pana ta torn ao inp PN opal 
te whet 4 atels ay SR eee nope ip xs eager’ 
yea 7 BAA ree . o 
aaadetsser 
a eemaaper eee es 
waeee ous eae rare 
per 


saree ae hE 
pert TT adel at 


an ot 
perry as 
weeta® 


ag eee a eT a 
he Yeo kere oot 
easy ut aed je toe a re ee vew 
vptatn par etel as je gba ransranae eu 
gtghatetina hed alas 
. pha carlin BS pate ag 8H teem eae eee 
ae Ripe aiv Acer as BR EOAE SS pares ts tone 2 : veema™ <a. 
papery hel gs ayes MePTY Fae ah eet 
ante per tte Peng amare a peaste Ca oe 
tae Meret c Por : a; Rote cit 
errr 4 eke. =? 
ie peena he ! Porat : 
repre A rf al 
agape Se ad PATE Lad 
“ilin, FOR CAR 
eas 


Paget 
ptm tye 
“ 


Pe hon « 25% 
coer ode cata ara Bi , S 
Tea Heth ft : 
wa atasats ere \. . ‘ oe, en 
ur senna, wel . i .. Ag t “ ae oe e 
mofere % 8! eeu een we emeceaet on Mbp ao tt A wf @ ‘ge 4 
rym are Oke cde “2 00 HEN Fon Me %. *y sr tpl o DOFPNe t a gle . a er 7 P - 
_pppatmnet ess cau ee fees Sie Ce : q 7 =e olen ~ 
yp eet. ° . ) 7 B4 
bee ’ 6 4 - ; “ . wa 
‘a ‘ eh, i , se yon oa 7 ‘ - ~ 
‘ ' - - gene ME ‘ye a ¢ : ase : : Slit e 
vee nee ange ees 4 wees i : Laat 
- ges 0 2 Oe ; © ne Mew ¥ . ee TN eet 
A oy ee ee ‘ $ mun rags as 6 SOAS 
’ r 7 : een 
‘ cane wt 
5 egies othe te 
gnnnse = if . 
wet wane gate, 
ees ita Tod 


eo 
" Deere 


seen eet 
pasos TA 
POT ee 


a FAI ee 


ig 
sect OF 
“ > 
. peer yt Mata 
rae ae 


mS aa 
eo. aa Lo . a 
wate “sate a 
0 a a tae! bad : 
° PP traf Pa a OS” 
aiateet el ed 
= “ wane?” 


ae ee 


“ee avers 
acenene’ 
Ne btaharh' wr? 
4 aad etre 


mest of see re one 
~.s quae sore 
act 
a wf 
cam ste te ee a4 
” « 


a ee 
pe onc 
4 oe 
a 
ohare te » ~ —? “ apne » 
aa 4 er 
0 ams , : . Came + D , . 

: - a Lhd e 

Ae ats sens © Sh tied eetaca we WES 2% 

ase te *. Pd tded Ph 

ari gne ae soiree a kcte zara 

Ge ve we 2 oer 

eiaie ie ae 

opts SPT re aaa 

me wee lee mn nome 2 BN 

‘Paerertions:* 

gue #7 Pim fe 


sor «en tey ater * ° 
» Cat at eee 


st . ae Po 

eur oe" ot aoe peas 
y - enoente** . 

ang ss Or ae arp ae 2% cipatie*ee™” oF vee Whe at heen” 
ee fe ee sata wer 

Ant aoe or wad ate BF 

Pe 


pat oO 
a 
Pa 
4 ‘ onsets © 
cane reg 
Py 


ac, 3 
wae erenal = 
ry Pret haan 
oe 
eae - 


« wore 
Pac 
mn 0 08 
2. ae 


—=aw ac we 
oe 
ate 
- dwt * 
* a = , 
inte ae? oe v 
° ~ ‘ 
= * . 


atetac te 
~ 


ah 


sup o Mega “AN eOrs 
& . 
‘ ‘ were RS” 
quate ar 
fre 


=e 
_ 8 = 
are eee aw~waet 
tga te ere Fe are 
Sc a ta em a eee —_ 
or oer 


sp oe Sante we wom 
"ie we oh 
bad en Ce 
aa Pee hh 
. aun ete 7. 
- 
ware Ore waa - 
—# "te 


° 

ac @ 
wetter & 
~ 


-_a— 
e- wn ew ae aie 
cate 

- .o7 

Palnec iets * 

ous on wee? 

















Unclassified 
SURITY CLASSIFICATION OF THIS PAGE 


Form Approved 
REPORT DOCUMENTATION PAGE OMB No. 0704-0188 
REPORT SECURITY CLASSIFICATION 1b. RESTRICTIVE MARKINGS 
(CLASSIFIED 
SECURITY CLASSIFICATION AUTHORITY 3. DISTRIBUTION/AVAILABILITY OF REPORT 


Approved for public release; distribution is unlimited 


| DECLASSIFICATION/DOWNGRADING SCHEDULE 


/ERFORMING ORGANIZATION REPORT NUMBER(S) 5. MONITORING ORGANIZATION REPORT NUMBER(S) 





NAME OF PERFORMING ORGANIZATION 6b. OFFICE SYMBOL] 7a. NAME OF MONITORING ORGANIZATION 





val Postgraduate School OR 
ADDRESS (City, State, and ZIP Code) 7b. ADDRESS (City, State, and ZIP Code) 


ynterey, CA 93943-5000 





NAME OF FUNDING/SPONSORING 8b. OFFICE SYMBOL] 9. PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER 
ORGANIZATION 
ADDRESS (City, State, and ZIP Code) 10. SOURCE OF FUNDING NUMBERS 













PROGRAM 
ELEMENT NO. 





PROJECT TASK WORK UNIT 
NO. NO. ACCESSION NO. 
TITLE (Including Security Classificatlon) 


TTING AND PREDICTION UNCERTAINTY FOR A SOFTWARE RELIABILITY MODEL 


PERSONAL AUTHOR(S) 
-<NNISON, Thomas E. 


TYPE OF REPORT 13b. TIME COVERED 14. DATE OF REPORT (Year, Month, Day) | 15. Page Count 
. SUPPLEMENTAL NOTATION 
le views expressed in this thesis are those of the author and do not reflect the official policy or position of the 
2partment of Defense or the U.S. Government. 
COSATI CODES 18. SUBJECT TERMS (Continue on reverse If necessary and Identify by block number) 


FIELD [ GROUP [| SUB-GROUP | Software, Reliability, Bootstrap, Prediction Analysis, Software Reliability Models, 
Cd =«*; Bayesian Methodology 
Eee. | 


. ABSTRACT (Continue on reverse If necessary and Identify by block number) 





The cost of system operational testing is steadily increasing. It is desirable for the software manager to know if the 
software is sufficiently well developed or reliable to support such testing. Current software reliability models provide 
Only point estimates of the mean time to next failure or expected number of errors to occur in additional testing time. 


The goal of this thesis is to take into account prediction uncertainties of a software reliability model. Bootstrapping 
is used to provide the software manager with confidence limites of the predicted expected number of faults to occur 
for additional testing time. The results can be particularly useful to a software manager who has to answer a 
subjective question: is the software reliable enough to support system operational testing? A range of predicted 
expected number of faults will be of more use toa software manager, who has to justify the answer to this question, 
than just a point estimate. Two software fault data sets are analyzed with this techniques emphasizing how a 
software manager should analyze the results. 


DISTRIBUTION/AVAILABILTIY OF ABSTRACT 1a. REPORT SECURITY CLASSIFICATION 


(| UNCLASSIFIED/UNLIMITED [] SAMEASRPT.[ | OTIC Unclassified 
a. NAME OF RESPONSIBLE INDIVIDUAL 22b. TELEPHONE (Include Area Code) |22c. OFFICE SYMBOL 
onald P. Gaver (408)646-2605 OR/Gv 

DD Form 1473, JUN 86 Previous editlons are obselete. SECURITY CLASSIFICATION OF THIS PAGE 


S/N i= T > a aie 9 7 


Approved for public release; distribution is unlimited. 
Fitting and Prediction Uncertainty 


fora 
Software Reliability Model 


by 
Thomas E. Dennison 
Lieutenant, United States Navy 


B.S. Chemistry, Villanova University 


Submitted in partial fulfillment 
of the requirements for the degree of 


MASTER OF SCIENCE IN OPERATIONS RESEARCH 
from the 


NAVAL POSTGRADUATE SCHOOL 
March 1992 


ABSTRACT 


The cost of system operational testing is steadily 
increasing. It is desirable for the software manager to know 
if the software is sufficiently well developed or reliable to 
Support such testing. Current software reliability models 
provide only point estimates of the mean time to next failure 
or expected number of errors to occur in additional testing 
time. 

The goal of this thesis is to take into account prediction 
uncertainties of a software reliability model. Bootstrapping 
1s used to provide the software manager with confidence limits 
of the predicted expected number of faults to occur for 
additional testing time. The results can be particularly 
useful to a software manager who has to answer a subjective 
question: is the software reliable enough to support system 
operational testing? A range of predicted expected number of 
faults will be of more use to a software manager, who has to 
justify the answer to this question, than just a point 
estimate. Two software fault data sets are analyzed with this 
technique emphasizing how a software manager should analyze 


the results. 


Label 


is 


ge Clie 


Oo Q Ww Pp 


TABLE OF CONTENTS 


INTRODUCTION 


SURVEY OF SOFTWARE RELIABILITY METHODOLOGIES 


ye 


rae 


25 


4. 


a 


De 


ay 


4. 


TIME BETWEEN ERRORS (TBE) 


Jelinski and Moranda Model 
Schick-Wolverton Model 
Geometric Model 


Use of Time Between Errors (TBE) Models 


FAULT COUNT MODELS 


Generalized Poisson Model 
Non-homogeneous Poisson Process Model 
Schneidewind’s Software Reliability Model 


Use of Fault Count Models 


SOFTWARE RELIABILITY MODELS 


DATA ANALYSIS 


MODEL DEVELOPMENT 
BOOTSTRAP 

RESULTS 

USE OF RESULTS 


APPLICATION TO TWO DATA SETS 


eV 


peal 


13 


14 


14 


16 


17 


Za 


PA AL 


24 


24 


28 


Zo 


30 


ial 


ive CONCLUSION 


APPENDIX 


REFERENCES 


BIBLIOGRAPHY 


Bit LIAL DISTRIBUTION LIST 


43 


46 


59 


oa 


63 


ACKNOWLEDGMENTS 

I owe a Significant amount of gratitude to my advisor, 
Professor Donald P. Gaver. His guidance and encouragement 
were paramount in the completion of this research. He 
broadened my horizons, not only in statistical methods but 
also in how to communicate ideas. It would have been very 
difficult to complete this research without his advise and 
expertise. 

I am also grateful to my second reader, Professor Timothy 
J. Shimeall for his willingness to evaluate this research. 

Finally, but by no means less important, I am deeply 
indebted to my wife, Janice, who at the same that this 
research was being conducted endured a difficult pregnancy, 
compensated for my slack in parental duties during the many 
days I spent away from home, and managed a career job of her 
own. There is absolutely no possible way I could have 
completed this research without her encouragement and 
uncompromising, loving support. She is a true blessing. To 
my children , Jennifer, Sean, and Ryan, I thank you for your 
Smiling, cheery faces that lifted my spirits after many days 


away from home. 


Vial 


26 INTRODUCTION 

Prior to costly operational testing of a system consisting 
of hardware and its embedded software, it would be highly 
desirable to know whether these two major components are 
sufficiently reliable to support such testing. Specifically, 
this is equivalent to asking whether the software has reached 
a state of maturity such that unforeseen faults (bugs, errors, 
system crashes, etc.) are not likely to occur during 
operational test of the entire system, or later, during a 
systemic mission. 

Estimation of hardware reliability is relatively well- 
understood. Unfortunately, software reliability or maturity 
prediction is not as well understood at this time. The 
ANSI/IEEE definition of software reliability is the ability of 
a program to perform a required function under stated 
conditions for a stated period of time (IEEE, 1984). Since 
testing software has an associated cost whether it is in 
computer run time, labor costs, lost market share resulting 
from late delivery of a product or, in the case of military 
equipment, sacrificed range-testing time and aborted missions, 
there is a finite time allocated for testing and removal of 
Pauwets (bugs). A moderate-sized program with 264 branches 
would have 2*% independent paths (greater than the estimated 


number of atoms in the universe). Obviously, it is infeasible 


to test each path (Dalal and Mallows, 1988). Testing and 
debugging costs are estimated to range from 50% to 80% of the 
costs for development of a working version of software 
(Beizer, 1984). The constraints of a finite time period for 
testing and the cost of testing are excellent incentives for 
prompt and accurate determination of software reliability. 
Put in the form of a question: when can testing be stopped 
and the product delivered with a high level of confidence that 
the customer will be satisfied? 

Software reliability estimation is based on the results of 
testing. Software testing can be broken down into four major 
categories: unit, integration, system and regression testing. 
Unit testing iS uSually done by the programmer in an informal 
manner. Integration testing is done in an orderly progression 
such that the software elements are combined and tested until 
the entire software package has been tested. System testing 
is integration of hardware and software to verify that the 
system meets specified requirements. Regression testing is 
retesting to detect faults that may have been introduced 
during program modification (Hernandez, 1989). One purpose of 
testing is to produce quantitative measures of software error- 
proneness after effort has been expended in the integration 
testing, system testing, and fault removal phases. 

Software testing, a follow-on to hardware reliability 
prediction has been of considerable importance and interest 


from the mid-1960’s to the present. The Navy’s Operational 


Test and Evaluation Force recently (January, 1992) held a 
symposium for DoD agencies to discuss and exchange ideas and 
methodologies on software testing and reliability. There are 
two basic differences between hardware and software 
reliability predictions. Hardware prediction usually assumes 
independence of failures, and, after some point, the 
reliability measuring process does not affect the failure 
rate. Software reliability prediction models should assume 
interdependence of unit failures, and that testing improves 
reliability. Removing a program fault or bug during 
developmental testing reduces the likelihood that a fault will 
become operative later in an operational setting that will 
cause a mission to abort. The software fault-prevalence and 
appearance prediction problem has been judged to be inherently 
more difficult than hardware reliability prediction (Beizer, 
1984). 

There are several software reliability models that will be 
discussed later. Beizer in his seminal work Software System 
Testing and Quality Assurance (Beizer, 1984) summed up the 
Similarities of the models best. 


ila Most models assume a fixed but unknown number of 
faults when testing. 


2. Faults are universally assumed to be independent (some 
of the later models, Schneidewind’s Software Reliability 
Model for example, do not necessarily make this 
assumption). 


3. Most models assume perfect debugging. That is, the 
debugging process introduces no new faults. However, some 
of the later models take into account that not all 


detected faults will be fixed, and that the debugging 
process itself may introduce new faults (Littlewood and 
Verrall ‘'s Bayesian Reliability Growth Model takes into 
account imperfect debugging). 


Aw Most models assume that test time and calendar time 
are the same. 


5. The models assume that failure rate is proportional to 
the faults remaining. This implicitly means that faults 
are assumed to cause single failures and each failure can 
be related to one failure. 
6. The models assume path homogeneity. That is, data are 
entered randomly and such data uniformly exercise all 
code. This is in direct contradiction to the reality that 
the most paths cover a small percentage (say under 10%) of 
the code. 
The difference between the models lies in the degree with 
which these assumptions hold true, i.e. the type of random 
process according to which the failures occur, and how data is 
fitted to the models (Beizer, 1984). 

The models that are described in Chapter II do not 
necessarily perform well for all types of data. There is no 
"Silver bullet" (Brooks, 1986) that will take on all comers 
successfully. One model may predict reliability well for one 
data source but not another. The users of the models must 
take into consideration the predictive quality of a model 
prior to basing decisions on the output of the model (Abdalla 
et al, 1986) and (Goel, 1985). One possible way to do this is 
to analyze the data using various models. The manager selects 


the model that demonstrates the best predictive qualities, 


i.e. the model that appears to best fit the data and provide 


useful results. The choice is difficult because it is 
conducted in an atmosphere of uncertainty. 

Our hypothesis is that software reliability can be 
predicted, but with error. It is important to take account of 
the variabilities and uncertainties that are inevitably 
present, at least those associated with sampling (finite 
data), the most serious errors may be associated with model 
choice, however. To test this hypothesis of predictability we 
analyze sources of fault (error or bug) data using a 
modification of the BELLCORE MODEL (Dalal and Mallows, 1988) 
to estimate the reliability of the particular software project 
and the quality of the prediction produced by the model. 
Parametric estimates are made by maximum likelihood but also 
by use of an approximate Bayesian technique. Error estimates 
are made by a re-sampling technique known as bootstrapping. 

The parametric bootstrap technique was used in the 
aftermath of the Challenger disaster to analyze the O-rings 
that failed. Although the analysis was done on hardware the 
methodology that we propose in Chapter III and the appendix is 
Similar. The analysis of the O-rings showed the bootstrap 90% 
confidence limits expected catastrophic failure rate of at 
least 13% at temperature of less than 31 degrees, but less 
than a 2% failure rate at temperatures above 60 degrees (Dalal 
et al, 1989). Had the NASA decision makers had this 
information available to them the consideration to postpone 


the launch may have been taken more seriously and the disaster 


prevented. The analogy for the software manager to consider 
is the predicted number of faults to occur for some specified 
time acceptable. It is hoped that, the wrong decision will 
not have consequences as severe as the Challenger disaster. 
The techniques that we describe provide a quantitative tool 
for the software manager to substantiate the decision to 
schedule (postpone) system operational testing. 

In Chapter II, we briefly describe several software 
reliability prediction models that have been proposed in order 
to provide a basis of understanding of the discussion. [In 
Chapter III and the appendix, we present the model fitting 
procedure, the method used to determine the quality of the 
prediction, the resulting data obtained from the analysis, and 
methods to improve this methodology from the perspective of a 
software manager. In Chapter IV, our conclusions are provided 


and directions for future research are suggested. 


er SURVEY OF SOFTWARE RELIABILITY METHODOLOGIES 
This survey is concerned with only two categories of 
software reliability models: those for time between errors 
(TBE), and for fault count (number of errors in a specified 


time). 


A. TIME BETWEEN ERRORS (TBE) 

TBE reliability assessments attempt to predict the mean 
time between failure (MTBF) of the ith failure based on that 
to the (i-1)th failure. The TBE can be measured in either 
central processing unit (CPU) time or wall-clock time. Wall- 
clock time can be misleading: it can elapse regardless of 
whether or not the program is running. From this information 
the software manager can gain confidence that the software 
will exhibit the operational capability to complete its 
mission: to operate without failure for a mission time. A 
system that experiences multiple, severe software errors that 
prevent the system from completing its operational mission is 
not ready for costly live exercises as in operational testing. 
For example, a system that is supposed to detect, track and 
engage a missile during a scenario of five minutes’ duration, 
but whose software experiences a severe fault every thirty 
seconds on average, is obviously not ready to conduct an 


expensive live exercise or actual mission. Here are some 


models that attempt to predict (mean or average) time to 
failure. 
1. Jelinski and Moranda Model 
Jelinski and Moranda developed the "De-Eutrophication 
Model" (Moranda and Jelinski, 1972), (Farr, 1983). The 
assumptions are: 
® The rate of fault detection is proportional to the current 
fault content of a program. 


® All faults are equally likely to occur and are independent 
of each other. 


® Fach fault is of the same severity as any other fault. 


® The fault rate remains constant over the interval between 
fault occurrences. 


® The software iS operated in a manner similar to 
anticipated operational usage. 


@e The faults are corrected instantly, without introduction 
of new faults into the program. 


The hazard rate for the ith fault is 


ZC) —0 (Se (2209) 
where: N = total number of faults initially in the system 
1 =-Len fault. eoneoceuns 
@ = proportionality constane. 


X = t, - t,, 1s the time between the ith and the (1-1)st fault 
and 1S assumed to have an exponential distribution with rate 


GA) 


£(X,) <O[N-(a-1) |e 51a ae (2 2 


The likelihood function for the parameters 98 and N is 
ioe X,) =[[- Biv= me) eromee oe) 6) (2 3) 
(Pees oS At . 


Taking the partial derivatives of ln(L) with respect to N (N 
is allowed to assume any real value as a convenient 
approximation) and 8, and then setting the equations equal to 
zero, the solutions for the following set of equations are 
obtained as maximum likelihood estimates for N and 6 (N is 


estimated by numerical techniques, then used to solve for @): 
= n 
a SEA SAREE oo 
ae - De - “ 


1 
>: Se 


ee 7 : Ds 
N- = (S"_ (4-1) X)) (2.5) 
doin *: 
1=1 
The estimate for the mean time between failure (MTBF) for the 


(i+1)st fault occurrence is 


al alt 


"BF. = PIS a 
pct ts Z(t;) 6(N-2) 





(2.6) 


The data required to use the Jelinski-Moranda model are the 
observed times of the fault occurrence (t;’S), or the times 
between the faults (xs). 
2. Schick-Wolverton Model 
The hazard rate for the Schick-Wolverton model 


(Schick and Wolverton, 1978) and (Farr, 1983) is proportional 


to the number of faults in the program and the amount of 
testing time. An assumption of the model is that as more 
testing is completed the probability of detecting faults 
increases because of "zeroing-in" on the areas of code where 
the errors lie. The assumptions are: 

@® The rate of fault detection is proportional to the current 
fault content and to the amount of time expended in 
testing. 

@® All faults are equally likely to occur 

® All faults are independent of each other 


@® All faults are of the same severity 


® The software is operated in a manner similar to the 
anticipated operational usage 


® Perfect fault correction occurs. 


The hazard function is 
Z(X,) -O(N=(2—=1) lee (2.79 


where: X, = the amount of time spent testing between the 
occurrence of the ith and the (1i-1)st fault 
N = total number of faults initially in the program 
6 = proportionality cCenstante- 


The reliability function of X, is 


x? 
R(X;) =exp (-8[N-(i-1)] >) (2.8) 


16 


The density function of X, is 
Xj 
He =e xe 9 ae 
If X’/2 is replaced by Y; the model is formally identical to 
the Jelinski-Moranda model previously described. iG) aeelche™ 
substitution of any known function of X, allows transformation 


to the Jelinski-Moranda model. N and 86 are estimated by 


MLE’ S: 
6 2n 


—1  ~A 05> Lune.” Dak 
yo (A (4-1) ) x? oo 


1 Yin Xi 
ae | =6 1 (Aiba, 


N-(i-1)) 2 


The estimate for the mean time between failure (MTBF) for the 
(i+1)st occurrence is 
ee aie 
26 (N-i) 
The data requirements are the time of the fault occurrence, t,, 
or the time between the ith and (1i-1)st fault. 
3. Geometric Model 

The Geometric model (Moranda, 1975) and (Farr, 1983) 
1s a modification of the Jelinski-Moranda "De-Eutrophication" 
model. It differs from that model as follows: it does not 
assume a fixed number of faults in the program, and the faults 


are not equally likely to occur because as debugging 


ab al 


progresses faults become harder to detect. The assumptions 
are: 
® There is an infinite number of total faults (the program 
is never totally fault free). 
@® All faults do not have the same chance of detection. 
® Detections of faults are independent. 


® The software iS operated in a manner similar to 
anticipated operational usage. 


® The fault detection rate forms a geometric progression and 
is constant between faults. 


The hazard rate for the ith fault is 


Ze) Doe, (2. 
where: t. = time between the ith and the (i-1)th fault 
D = initial hazard rate 
6 = fault detection rate (0<6<1) 
n= the nth fLaulewto occucz- 


X, = time between the ith and the (1i-1)st fault. The X, are 


independently and exponentially distributed with rate Z(t), 


so the density function of X, is 


F(X.) =DQi-te@ PD (2) ae 


at 
D and @ are estimated by MLE’s: 


§” 
pa 67x, 


D= (2. ash 


12 


Dye i8°x; wage 2 
yao 





(22 OF 


Equation (2.16) is solved for 6, and that value is substituted 
imto (2.15) to find D. From these equations the MTBF until 


the (n+1)st fault occurs after n faults have occurred can be 





obtained: 
rs ab 
MTBF.,,=E(X_,,) = SOE (eal) 
The data requirements are the time of the ith fault (t,), or 
the time between the faults (X.), for i =1,2,...,n. 


1 


4. Use of Time Between Errors (TBE) Models 

The TBE for models in this category can be measured in 
either wall-clock time or CPU time. The models may be used to 
predict the expected time to the next failure. Confidence 
limits on the expected value should be used to obtain a range 
of time to the next failure. The software manager should be 
asking: is the expected time of next time of failure longer 
than the time required for operational testing of the software 
within the overall system? Tf the time required for 
operational testing of the system is greater than the mean 
time to failure for the (i+1)th failure then the prudent 


software manager should consider postponing operational 


iL 


testing in favor of continued developmental activity and 


testing. 


B. FAULT COUNT MODELS 
Fault count models use the number of faults that occurred 
in a testing interval to determine the expected number of 
faults in the next testing interval. Software managers can 
employ this method by simply counting the number of faults in 
a given test period i.e. day, week, or month, provided test 
exposures are the same. This provides insight into how well 
the testing process is working. 
1. Generalized Poisson Model 
The Generalized Poisson Model (Schafer et al, 1979), 
(Farr, 1983) is similar to the Jelinski-Moranda and Schick- 
Wolverton models but uses fault count observations in fixed, 
equal-length intervals rather than times between faults. The 
assumptions are: 
® The expected number of faults occurring in any time 
interval is proportional to the fault content (number of 
bugs remaining) at the time of testing, and to the amount 
of time that has been previously spent in testing. The 
actual number of faults that appear is assumed to be 


Poisson distributed. 


® All faults are equally likely to occur and are independent 
of each other. 


@® Fach fault is of the same severity. 


® The software is operated in a manner similar to the 
anticipated operational usage. 


14 


® The faults are corrected at the ends of the testing 
intervals. (Note: Faults discovered in one test interval 
may be corrected at another test interval; the only 
restriction is that the fault correction come at the end 

of the testing intervals.) 
Testing intervals are of length x, and f, faults occur during 
the ith interval. At the end of the ith interval a total of 


M faults are corrected. 


The expected number of faults in the ith interval is 


Pee EEO WN Meee (kay oes, X,) 7 (273:8)) 
where: @ = proportionality constant 
N = initial number of faults 

g, = function of the amount of testing time spent 


previously and currently and is nondecreasing; 
as testing progresses more faults are found 


specifically, 


0), (Ogee oo ae ere, Ye die ae (2739;) 
where @ 1S assumed known. 


f. is Poisson with mean = 6@(N-M,)g;. N and 6 are estimated by 


MLE’s: 
Q=_ dior Fi —_ (3.20) 
Da F~ Deiat 43-193 
5 gs (BoB) 
ia (om,,) it FS | 


fs 


These non-linear equations must be solved for 6 and N. From 
this the expected number of errors in the (n+1)st test 


interval can be obtained, 


E( £,4,) =O (N=M.) g(a ee (2.225 


rAn+1 

where: xX, 18 the anticipated testing time for the (n+1)st 
test interval. 

The data requirements for this model are the lengths of the 
test intervals, (x), the total number of faults corrected at 
the end of a test interval, (M), and the number of faults 
discovered in each interval (f,). 

2. Non-homogeneous Poisson Process Model 

The Non-homogeneous Poisson Process Model (NHPP) (Goel and 
Okumoto, 1979) and (Farr, 1983) assumes that the fault counts 
for testing intervals follows a Poisson distribution. The 
expected number of faults in the Poisson process model is 
proportional to the number of faults left in the program. The 
assumptions are: 


® The software is operated in a manner similar to the 
anticipated operational usage. 


® The numbers of faults detected, (f,), in the any test 
interval, (ieeeeste oo, are independent for any finite 
collection. of times €)\<t)7... 0 ee 


® Faults are of the same severity. 
® Faults are equally likely to be detected. 
® The cumulative number of faults detected at any time t, 


(N(t)), is a Poisson distribution with mean m(t). The 
mean, m(t), is the expected number of faults to occur for 


1G 


any time period (0,t) and is proportional to the expected 
number of undetected faults at time t. 


@® m(t) is bounded. 


The specific mean function used is 

m(t)=a(1-e7P*) , (2223) 
and f; is the number of faults in the ith interval, 

if =NCG Ne.) ; (2224) 


where: a = expected total number of faults to be 
eventually detected. 


a and b can be estimated by MLE’s: 


dia fa ; (2.25) 


a= 


(1-e7 Pt) 
7 -6t, £ -Bt, -Bt,; 1 
m© j=l Ds Ball (ares: ae ae ae) 226) 
(1-e Pte) i=1 e Petia _@ Pts 


From the estimates of a and b the expected number of faults in 


the next (m+1)st test interval is estimated to be 


)-m(t_) =a(e Pte-e Pte) (2.27) 


m+i1 m 


jeg is 


The data required for this model are the fault counts of each 
test interval, (f,) and time of the test interval, (t,). 
3. Schneidewind’s Software Reliability Model 
Schneidewind’s model (Schneidewind, 1975) and (Farr, 


1983) maintain that as testing progresses the fault detection 


Lf 


process changes. The later faults are therefore more useful 
in determining future fault counts. The model allows for 


three approaches. 


alae Utilize all the fault counts from the m intervals. 


De The first (s-1) intervals are ignored and only the s 
through m interval fault counts are considered. 


Ee The first (s-1) intervals fault counts are summed, and 
the individual fault count from the remaining s through 


m intervals are treated individually. Denote the sum of 
the fault counts in the first s-1 intervals by: 


ee 1a (2.28) 


Method 1 is used when the analyst feels that all intervals 
will be useful. Method 2 can be used when a significant 
change in the fault detection process has occurred at 
approximately the (s-1)st interval. Method 3 attempts to 
combine the effects of both approaches. The assumptions for 
all methods are the same: 

e aa fault counts for each interval are independent of each 

other. 


@® The fault correction rate is proportional to the number of 
faults to be corrected. 


@® The software is operated in a manner similar to the 
anticipated operational usage. 


® The mean number of detected faults decreases from one 
interval to the next. 


@® Intervals are all of the same length. 


18 


@ The rate of fault detection is proportional to the number 
of faults remaining. The fault detection process is 
assumed to be a non-homogeneous Poisson process with an 
exponentially decreasing appearance and detection rate. 


The rate of change of the number of faults detected in the ith 


interval is 


d,=ae' Pt) , (2229) 
The cumulative mean number of faults that occurs up to and 


including interval i is 


Ds= (1-e°F) (ease 


1 


The mean number of faults for the ith interval is 


eID e DE See 8) (aaa) 


a and 6 can be determined by MLE’s: 


Seulian (Gy) (2.32) 


ne Des fF) B (2.33) 


1-e Bm 
For Method 1, y is the solution to: 


F 
eT oe (2e34) 
Vay 1 





where: 


ibe, 


sy (sti-1) Laan : ( Zee Sy) 


Fee (2.36) 


For Method 2, y is the solution to 


Ay™*"*-(A+F, ,) y™ "1+ ((m-s+1) F, A) y+ (At FP, pao eee 


(2.375 
where: 
A= eee, (2.38 
ey He . (2.39) 
Pe ein 3) ; (2.40) 
1-e°8 


For Method 3, y is the solution to 





ASST) Fee eer (2.41) 
ve) y-1l y-1 


where: A is the same as Method 1 and F,, is the same as Method 
2. From the MLE’s of & and B the expected number of faults in 


the (m+1)st interval is 8 | . 
ae) “4 (ef ecko (2.42) 


20 


The time needed to detect a total number of M faults is 


r 
ik a 
a Cae | (Casey 


B 
The data needed for this model are the fault counts for each 
interval and a history of testing process in order to 
determine the interval that testing procedures may have 
altered significantly. 
4. Use of Fault Count Models 

Fault count models use the number of faults that occur 
in some testing interval. The models in this category predict 
the expected number of faults to occur in some additional time 
interval. Confidence limits on the expected number should be 
used to obtain a range of the predicted number of faults to 
occur for that time interval. Since there can never be a one 
hundred percent guarantee of perfect software, the software 
manager should be asking: is the predicted number of faults 
to occur for the time interval of interest acceptable for 
operational testing? If the predicted number of faults to 
occur iS too great then the prudent software manager should 
postpone operational testing in favor of continued 


developmental activity and testing. 


C. SOFTWARE RELIABILITY MODELS 
The number of software reliability models continues to 


grow. Assumptions have broadened to reflect the reality of 


PAAR 


the software development process with increased accuracy. The 
assumptions of some models described appear to be limiting. 
Faults all of the same severity can be worked around by 
modeling faults according to severity. The assumption that 
all faults are equally likely to occur and independent of each 
other can be resolved by assuming low severity faults occur 
more frequently than high severity faults, but faults of the 
Same severity class will be considered equally likely to 
occur. Instantaneous fault correction can be avoided by not 
counting faults which were previously detected (and counted at 
time of initial detection), but were not corrected (Farr, 
13383) 

Software managers need to be aware of the limitations and 
underling assumptions that underlie the various models that 
are available. The data that is needed to fit the models is 
critical to reliable results. The data collection needs to be 
an accurate reflection of the meaningful historical testing of 
the software. Some of the data that should be collected is 
computer usage time, testing intensity, extent of the software 
that was tested (was the entire system tested or just a 
particular module), and milestones in the software’s 
development (are requirements changed or added midway through 
the development of the software?) and, of course, the cost of 
testing. 

This study illustrates the use of a particular reliability 


model. Some of the specific questions that this thesis 


Pane 


addresses are: How iS a software reliability model used? 
What type of information does a model require? What kind of 
decision can a software manager make based on the results of 
the reliability model? 

In today’s fiscal environment software managers should 
have a "warm fuzzy feeling" substantiated by quantitative 
mesults for their product prior to initiating costly full 


scale, live operational testing. 


Ze 


III. DATA ANALYSIS 


A. MODEL DEVELOPMENT 

The model that is applied in this thesis is based on the 
assumption that the rate of error occurrence is a non- 
stationary Poisson process (NSPP) (Dalal and Mallows, 1988). 
The model is identical to the Schneidewind model, and is 
fitted according to Method 1, which assumes that all fault 
data is of equal value. Let N(t) be the number of faults that 
occur in (0,t); where t is software running time. The 
probability that the number of faults to occur by time t is 


given by: 


P(N(t) =n} = 2 AEE (3.1) 


where A(t)=A(1-e“). A test time, t,, was chosen. This length 
of time is divided into periods of length A = t,/J; where J is 
the total number of intervals. The jth interval is such that 
(j-1)A<t<jA. The number of observed counts (faults) in the 
Jet VInterval is “nz The probability distribution for the 


number of faults in [(j-1)A to jA] is 


aye 
P{N,;=N(jA) -N( ae) A) =a Ses (A 5) 





, =O) ee (Sze 


24 


where 


A, =E(N,] =A (1-e #94) -A(t-e~bhU-VA) (ae 2a) 


=Je(-1)4 (4 -e HA) : (3.20) 


The parameters p and A are estimated by maximum likelihood. 


The likelihood function is 





ys wa, (45) 
L(A, w=] ,e” Yi (3.3) 
The natural log of L(A,p) is 
(A, p) =1n(L) =-S™ Ast) yl (Ay) - (3.4) 


The partial derivatives of 1(A,p) with respect to A and yp are 
taken and set equal to zero. This allows Metombenwmait ten an 
terms of p and n(t,), the total number of counts to occur up 


to time t,, as allan 
ee (3.5) 


(l-e bt? 


h is substituted into the partial derivative of 1 with respect 


to p to give, 


“pl, —pA 
Bi Ea (2 3) ae yee (3.6) 
1-e 1-e A 
where = 
MEJ= Gata, (3.7) 


pu can now be solved for from the following equation: 


Z5 


Aets ete eae (3 7en 
1-e HA 1 =e Hee n( e) 


This equation closely resembles Schneidewind’s result; see 


(2.41). Since t,=AJ, equation (3.8) becomes 


eta Jets _ntt,) (3.9) 
es eee AS). 





then, = 
eth 2 (eae _ nt.) 


ieee T-(e-Aye nit.) . (3. aon 


By letting x=e** into equation (3.10) becomes, 








=r; (3 2 


x is solved for iteratively. Let J=0 for the first iteratvom 





then 
set) _1(t,) 3.46 
tO) yx) att) a 
eq) = ee (3 2a 
n(t.) =n ©.) eee ae 

r({(2) is 

rQ)<r)+7 See (3.14) 
11a 


x(2) 1S given by, 


26 


X ( TTS (3.15) 
Hence the iteration of r(n) and x(n) is 
caine) Sela) (3.16) 
dCi ae eed 
ee Sal, 
Belt) eee (427) 


The iterative process continues until x(n+1)-x(n) < €, where 
€ 1s a suitable small number; x(n+1) is then substituted into 
equation (3.5) to get we Using the estimates of p and A, the 
expected number of faults to be observed in some additional 


operating time t,, where (t,, t,+t,) 1s of length kA, can be 


estimated 


(1-e Akay} (3.19) 


A Bayesian methodology is discussed in the appendix. This 
method attempts to utilize past experience from software 
projects having similar characteristics as the software in 
question. If the distributions of A and yp are known from 
experience then this information can be useful in estimating 


~ 


the parameters \ and p. 


In4) 


B. BOOTSTRAP 

Bootstrapping was used to obtain the confidence limits for 
2, K, and E[N(t,)-N(t,)] = E[AN(t,)]. This technique takes 
into account the sampling uncertainties in the estimates by 
removing the errors in the standard approximation (Dalal et 
al, 1989) and (Efron, 1985). To obtain the estimates of the 
sampling variability of jp, A, and E[N(t,)-N(t,)]=5 (Agee 
proceed as follows. The probability that a count occurs in 


the jth period 1S conditienal on Nie aie 


P{N,=n,,...,N,=n,|N,+No+...+N,=n(t,)} (3.20a) 


De hae 
it. (3 .20b) 


o=1 n;! > A, 


where LA,=1-e-"*". From this the probability that a count falls 





in the jth interval is 

Lee 

7 1-e77A 
Uniform (0,1) random numbers were generated, where the 
k=1,2,..,n(t,); U, 1s the KEn vandom number aes Piy<U,sP; then 
a count is added ton. The simulated n,’s were then usecase 
re-estimate jf, \, and E[AN(t,)]; these are the bootstrap 
values. This process was repeated 1000 times to get a range 
of values for hi, r, and E([AN(t,)]. To create a 90% confidence 


limit of the estimate E[AN(t,)] the 1000 bootstrap estimates 


28 


of E[AN(t,)] were ordered and the values of the 50th and 950th 
quantiles were found. These are quoted as the 90% confidence 


region (E[AN(t,)],;, E[AN(t,) ]os) - 


C. RESULTS 

The estimates for the parameters were obtained using three 
Gifferent A values and three different t, values. The value 
t, was selected such that t, + t, = time of last observed fault 
to occur; this allows for comparison of the predicted expected 
number of faults to occur with the observed data. The data 
provided in Tables 1 through 6 are the 90% confidence interval 
obtained by the bootstrap. The most difficult aspect of this 
thesis research was obtaining appropriate test data. The 
data that I received from various sources was unacceptable for 
various reasons: no testing history, severity of faults not 
listed, no milestone events listed (i.e. one data set covered 
10 years but no indication of modifications to the software), 
non-software errors listed with software errors, description 
of errors could not be interpreted (which may have eliminated 
some of the problems mentioned above). The underlying cause 
of this is that organizations that I contacted for data do not 
use any systematic method for determining’ software 
reliability. A "warm fuzzy feeling" for the software seems to 
be the current method used to judge the reliability of the 
software. This feeling gets warmer and fuzzier as deadlines 


draw closer. The data sets used in the analysis of the model 


20 


were obtained from a technical report on other software 
reliability models (Abdalla et. al., 1986). The data was 
given as time (CPU) between failures. The results of the 
bootstrap for Data Set 1 are given in Tables 1-3; the 
graphical results (Dalal, 1990) are depicted in Figures 1-3. 
The results of the bootstrap for Data Set 2 are given in 
Tables 4-6; the graphical results (Dalal, 1990) are depicted 


in Figures 4-5. 


D. USE OF RESULTS 

Suppose a time t, has been spent testing the software, and 
n(t,) faults were found. The n(t,) faults can be broken up 
into nj’s, the number of faults in each period j of size Amiga 
=a) This information can be used to estimate the 
parameters p and rd, and a point estimate of the mean or 
expected number of faults to appear in the time interval (t,, 
t.+t,). Operational testing of the system will require some 
time t,. Bootstrapping can now be done to assess the sampling 
uncertainty in the estimate of the expected number of faults 
tO- appear in (t.,  ty4t.)< This will be done by quoting 
bootstrapped 90% confidence limits. The expected number of 
faults predicted to occur can be compared to the requirements 
of the system i.e. for some time t, for example; at most F 
faults are allowed (suppose F can be specified). If the 
predicted expected number of faults is less than the allowable 


number of faults then system operational testing might be 


30 


worth the expense at this time. In contrast to this, if the 
expected number of faults 1s greater than the specified number 
of faults then system operational testing should be postponed. 
Testing should continue in the lab, at the developmental level 
until t, and n(t,) are large enough that the expected number of 
faults for the required operational time meets specification. 

A more conservative approach is to replace the estimate of 
the mean number of faults by the upper confidence limit of the 
mean number of faults. Such a conservative approach is 
recommended. 

If there are no specifications the individual responsible 
for scheduling system operational testing will have to make a 
subjective decision. Is the expected number of faults to 
occur in (t,, t,+t,) small enough to warrant spending the money 
to carry out system operational testing, or should this 
testing be postponed until the expected number of faults is 
lower. The assumption is that lab testing will continue on 
the software, increasing t, and n(t,), but reducing the number 
of unfound and uncorrected faults. The more faults found in 
lab testing of the software the fewer the number of faults 
that are likely to occur in the more costly system operational 


testing. 


eink 


E. APPLICATION TO TWO DATA SETS 

The fitting and error assessment procedure was applied to 
two data sets (Abdalla et al, 1986). Figures 1, 2, and 3 
refer to Data Set 1; Figures 4, 5, and 6 to Data Set 2. 

Figure 1 has a A of 10 CPU minutes with three combinations 
of t, and t,. If the range of the expected number of faults 
for t,=1250, t,=250 (2.21 to 6.09) is acceptable the software 
manager may choose to schedule operational testing. The same 
argument can be made for t,=1000, t,=500. A problem occurs for 
t.=500 and t,=1000. If the range for the expected number of 
Faults to occur (4.69 to 22.22) is acceptable the software 
Manager may choose to schedule operational testing. 
Unfortunately, 46 faults occur in Sie Sere This is 
extremely likely to be the result of use of an inappropriate 
model (it does seem unlikely that software with as many as 22 
mission-critical faults would be viewed as acceptable for 
starting operational testing). What can the software manager 
do to prevent something like this from occurring? Ideally, as 
testing continues, the rate at which faults occur should 
decrease (assuming a constant relative rate of testing), with 
that rate asymptotically approaching zero as t, becomes large. 
The slope of the estimated total expected number of | tauere 
verses test time for Data Set 1 from T=300 to T=500 is m=0.08 
(faults/cpu min). Figure 1 depicts this: the rate at which 


faults are occurring does not appear to be tapering off. The 


a2 


software manager can use this information to support a 
decision to go ahead with (or postpone) operational testing. 
From T=1000 to T=1500 the slope is 0.028 (faults/cpu min) and 
appears to be tapering off. The range of the expected number 
of faults to occur in the specified t, accurately reflect what 
peewally occurred. If the range of the expected number of 
faults is acceptable the software manager should go ahead with 
operational testing. Figure 2 (A = 20 cpu minutes) and Figure 
3 (A = 50 cpu minutes) can be interpreted similarly. 

The change in A for both data sets did not have a 
Significant impact on the range of the expected number of 
faults to occur, indicating that the model is’ somewhat 
insensitive to the size of A. 

Data Set 2 (Figures 4,5, and 6) shows only a small 
indication of the slope decreasing. This is why the 
confidence limits of the expected number of faults is so wide. 
The software manager can apply the same techniques listed 
above to make a decision to schedule (or postpone) operational 
testing. The software manager must repeatedly address the 
Questions: is the rate of occurrence of faults lessening, and 
is the range of expected number of faults acceptable to 
Support operational testing? 

A fitted model may indicate a narrowing range of expected 
number of faults and slope asymptotically approaching zero, 


consequently the software manager schedules operational 


33 


testing. Unfortunately, the results of the operational 
testing may be poor i.e. a relatively large number of errors 
may occur indicating that more developmental activity and 
testing is required to improve the software. For example, the 
model predicts n(t,)=22 for Data Set 1 (t,=500, t,=1000) ue 
the number of observed faults that occurred in t, was more 
than twice the predicted amount, 46. This example illustrates 
the relationship between modeling and testing. While a 
systematic underestimation indicates flaws in the model, 
occasional underestimation simply reinforce that software 
reliability models do not take the place of stressing software 
within a full system ina real-life operational environment. 
The purpose of this thesis is to provide the software manager 
with a tool to aid in the decision as to when to initiate 


operational testing, not to replace such a test. 


34 








———————————— == 
TABLE 1 
ESTIMATE OF PARAMETERS FOR DATA SET 1 
t,.=1250, t,=250 (CPU MINUTES) 
Observed number OFBbDUgGGeinetw1is 6 
90% Confidence Interval 


(cpu min) | og Rte 


3) 55 0.00272 34.5095 Zee 
OSs Fi Of 010176 146.509 5276 


0.00270 | 134.993 ry, 

92 : 0.00174 147.798 6.09 

50 5 % 0.00270 135.258 2.25 
95 % 0.00175 148.142 Bel | 


TABLE 2 
ESTIMATE OF PARAMETERS FOR DATA SET 1 


t,=1000, t,=500 (CPU MINUTES) 
Observed number of bugs int, is 14 
90% Confidence Interval 


BIN (9) 


Sat OR 00298 eZ © 37 Oi sy Ore 

95 3 ORO00177 147.640 14.73 
OmaG0296 12829169 5 . Ow 

se ‘ O00 176 ao .393 4-0 8 

50 545 OROUZI5 Jat) to \74i3) Spey 7) 
a5) 5 O00 7 5 150.549 14.96 















TABLE 3 
ESTIMATE OF PARAMETERS FOR DATA SET 1 
t.=500, t,=1000 (CPU MINUTES) 
Observed number of bugs int, 1s 46 
90% Confidence Interval 


BIN (t,) 

5 0.00600 952010 4.69 

ES) o OmO00827 alta! 2 reat. Ve ZC 

or 6 0.00600 Some o 2 4.70 

5 Gg ONG) PALS is 8 63 21.14 

se 5s G08 558 ENS ts 31s) 500 
95.3 O5G0317 eta 32 Dore ae 


35 















TABLE 4 
ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,=800, t,=300 (CPU SECONDS) 
Observed number of bugs in t, is 12 
90% Confidence Interval 


BInie,)] | 


5 % 0.00288 82.479 4.75 
oS ae O70 0 dgisE 12. .99,8 14.69 









| 0.00288 82.718 4.78 
“ : @ s0.Oel a 127.645 14.64 
50 5 % 0.00287 83.722 4.78 
95 % omoot id 131.003 14.66 

TABLE 5 


ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,=600, t,=500 (CPU SECONDS) 
Observed number of bugs int, is 21 
90% Confidence Interval 


ee a BIN (4) 





% 0.00298 78.513 10.00 
% 0.00068 Ls Sele 37.09 
% | 0.00296 79.189 1022" 
% 0.00067 200.950 37e 
5 % 0.00298 80.710 10.14 
95 % 0.00067 211.307 39,048 


TABLE 6 
ESTIMATE OF PARAMETERS FOR DATA SET 2 


t,=400, t,=700 (CPU SECONDS) 
Observed number of bugs in t, is 37 
90% Confidence Interval 


BIN (t,) 1 


5 % 0.00456 5 On goo 920s 
55 4 0.00058 239% 268 62.43 





0.00458 Sor 26 og 
0.00054 PA sye) < (O) IL AL 63.88 


be 
5 % 0.00446 62.014 9.45 
95% 0.00047 3252387 6.6ceo5 


a 


OO9|t 


OGZail 


° 
soscoceet® 
one seoceetoe 
PPT TTT eeecce nee 
eoercorce pe 
a ooee® 
oo 
oe 


067=(0)1 ‘oSz71=(S)1 


(SSLNNIN Add) SAWIL 
O08 


OOV 


O8 OV 
SLINVS 40 YSEWNN SALLVINWNO 


Oct 


CSE 


(CPU minutes) 


10 


Figure 1. Data Set 1, A 


a7, 


0091 


wt Doe 


ace one 
Se aeeai 
Rec toccensetooncnoes see 
ee 


ct 


O01 


spect osz= On ‘oSzt=(S)L 


(SALNNIN Ndd) SWIL 
O08 


O07 


O08 OV 
SLINW4s JO YSENNN SALW INNA 


Ol 


os 


A = 20 (CPU minutes) 


Data Set 1, 


Figure 2. 


38 


0091 


(SSLNNIN Add) 3WIL 
O08 


Ge 
re ee i 
et a oa aE i “iS 
BNO si Sect Tene 00S=(0)1 ‘o00L=(S)1 


eee? 
Po 
eeeeeneencd 

eaeee 


© 
oo” 
* 


OOV 


OV 


SLINV4 30 dSEWNNN SALVINANO 


O8 
Figure 3. Data Set 1, A = 50 


Ocl 


O91 


(CPU minutes) 


39 


O01 


AO 
oe” 
. 
° 
hd 
e* 
° 


O08 


(SGNO93S Ndd) 3WIL 


OOV 


OV 


SLINVIS JO YSEWNN JALLVINNND 


O8 


OZ1 


(CPU seconds) 


Figure -4. Data Set-—Z, 4A-=20 


40 


001 


(SGNODAS Ndd) ANIL 
008 00 


002=(0)1 ‘00v=(S)L 


x | 
- 


i ° a 


00S=(0)1 ‘009=(S) L, 


ooc=(0)1 ‘oog=(S)1 egg 


OV 


SLINVI 30 YSEWNN SALLVINWND 


O8 


O21 


(CPU seconds) 


Figure 5. Data Set 2, A = 20 


41 


O07 


A See ee 00S=(0)1 ‘009=(S)1 ¥ # 


oo 
ae* 
ro 
° 


(SQNO94SS Ndd) AWIL 
008 OO 


002=(0)1 ‘OOv=(S)L 


oor=(0)1 ‘ooa=(s)1 ee 


# 


af 


Ov 


O08 
SLINVS 30 YSEWNN 3ALLWINWND 


O21 


(CPU seconds) 


A= 50 


Data Set 2, 


Figure 6. 


42 


IV. CONCLUSION 

Software reliability models are useful tools that managers 
of software intensive projects have at their disposal. The 
bootstrapping technique will provide the manager a range of 
expected number of faults estimated to occur for some 
additional operating time. The question is, is the upper 
limit of the expected number of faults estimated to occur 
acceptable? The potential risks are additional cost for 
further testing or late product delivery. The ideal case is 
reliable software delivered on time and on budget. 
Unfortunately, reality is rarely ideal. The software manager 
must decide: is it better to deliver a product on time that 
may be considered unreliable by the user and be sent back for 
further testing, or to deliver a product late but of 
acceptable quality to the user? The purpose of this thesis is 
to provide a quantitative tool for the manager who may have to 
make such qualitative decisions. The use of software 
reliability models is not without associated cost, and risk. 
The data must be collected for input to the model. 
Recommendations for the type of data that should be collected 
are: 


® Operating time between failures (CPU time is the best) 
(Musa and Okumoto, 1984). 


® Calendar time between failures, although such times may 


not accurately reflect the opportunity for faults to 
reveal themselves (Musa et al, 1987). 


43 


® Testing history i.e. how many people are involved in the 
testing effort. 


@® How the software was tested 

® Intensity of the software testing 

® Cost of testing i.e. the cost to find and repair a fault 
before and after product delivery. 

Without useful data a reliability model has little 
practical use. The model presented in this thesis should be 
validated using data from several Navy systems. 

There are several areas for further research. How 
accurate are the predicted confidence limits in this model? 
What are the limits of applicability of this model? What 
effect do inaccuracies (due to replacing observed data with 
hypothesized data in cases where insufficient data is 
available) have on the model i.e. how robust is the model? 
Further development of other software reliability models 
should be pursued. Emphasis should be placed on obtaining 
confidence limits in addition to quoting only a point estimate 
of the expected number of failures predicted to appear for 
some additional testing time. These models should be verified 
using data obtained from Navy software intensive systems. It 
is infeasible to test every possible branch in a large program 
for faults. The software manager needs technical assistance 
in identifying where effort and money should be spent to 
deliver the best possible product. Will many faults in 


portions of the software that are rarely used/reached cause 


44 


more problems for the user than a few faults in frequently 


used/reached portions. 


45 


APPENDIX 


Software projects may have similar characteristics such as 
testing strategies or architecture, so that the information 
obtained about the reliability of one software project may be 
used to aid in the prediction of the reliability of another 
Similar, software project. This process can make use of 
Bayesian methodology (Dalal and Mallows, 1990), (Farr, 1983). 
If prior distributions of A and yp are specified then this 
information can be used help estimate the parameters A and pu; 


the posterior for these is 


Py yl(A,p) =KL(A, p) Pp (A) Dp, (p) , (a.la) 


=Ke -A(1-e ¥*) 4 alee) IT. ere ag, (1-e7 #4) Mea aa , 


(a.1b) 
where p,(A) and p(w) are the prior distributions of A and uz 
estimated from another software project that has 
characteristics similar to the software project currently 
being tested. The simplest idea is to integrate out A and 


marginalize on yw which yields: 


oo 


Py (nh) =Kf "ee “9478 ph) dhe Wa (=e 18) ne 


46 


The most convenient choice of p,(A) is the (conjugate) Gamma: 


(eager 


-p-aa 
pla) e T'(p) ’ 


(a.3) 


which when substituted into equation (a.2) yields the density, 


DP, (pb) =Ke “pAn(t,) (enue) GH |e tiara =! ynlts) 5-aa_(@A) a 


0 r(p) ° 
(a.4a) 
Sige agate) 2) f eZznltls) *B-1 4, ; (aera) 
0 (a+1—-e PAT) 2 Ee) +P 


= Kg vanlt,) (1-6 7BA) Bee) 1 


as A 


Using an uninformative prior, a=0, 6=0, and setting x=e"* 


equation (a.4c) becomes 


al 


\yae-4) = Kt yl te) (ay oe 
Caealaae! 


yee: +B 


The mode of the density is 


ie) =n (p, (x) ) =n(t,) Inxtn(c,) In(1-x) -(n(¢,) +B) In(a@+1-x”) 


(a.6) 


Taking the partial derivative of equation (a.7) with respect 


moex yields: 


47] 


mice x _n(t,) +B upee 


n(t.)  1>siiee a+1-x7 





(a 78 


If a=68=0 equation (a.7) is the same as equation (3.11), which 
gives the MLE. 
Suppose m=E[A] and o* = Var [A] in the prior, then @ = m/o? and 


B = m(m/o’). Equation (a.7) is 


x _ n(t,) +m(m/o?) ox? ee Tes 


OU = eee eee (a. 8) 
ix igs) (m/o*?) +1-x/ n(t,) 


If A is interpreted as the total number of faults in a 
particular software project, then the number of faults is 
discrete so a discrete distribution should be used for the 
prior, i.e. one could use a Poisson for the prior. However, 
it is easier to work with a Gamma distribution. If the Gamma 
distribution has same parameters as a Poisson then equation 


(a.8) is (since m=oa’) 


hace pine kee 2 'e) , (a.9) 


It is clear that the variance to mean ratio of the prior has 
strong influence on the effect of a prior estimate of the 
mean. . 

One BayeSian approach to eStimation is to find the mean 


(rather than the mode, or highest point of the posterior as is 


essentially done in the likelihood approach) of the 


48 


(approximate) posterior, Osxsl. To obtain an approximate 
posterior mode proceed as follows. If J is large x is small 


provided x>0, so expand in Taylor’s series to get 
p,(x) =k** (x¥(1-x)"% (BEB) (xh) (1-x)"1 , (a.10) 


where: n=n(t,) and n=n(t,). 


Equation (a.10) is a convex combination of two beta densities. 


K can be found by setting the left hand side of (a.11) = 1. 


E[x] = E[e*4] can be found, 


xeker (Ltn (m1) n+1 


= = (a.11) 
Pin+na+1) nen 
1+o P(n+J+n+1) n+J+n+1 
The approximation to this is 
n! loll 5. bem Geneeo ae renewal 
= = a = 
pee) Le (n+J+n) 1 ntJ+ntl _ pa (a.12) 


nN! Heel’ (n+J) ! 


(n+n)! 1+@ (n+J+n)! 


Unfortunately, n=n(t,)=136 for Data Set 1; even with factoring 


out n=n(t,), the factorial ratios are on the order of 10°”. 


However, it is justifiable to use an approximation to the 


factorials to get 


geal |. ser) eke ales 
ye Sones Seagate ensenel 


1+ 2tB (_ntl ya 


Ae OGs all 


(a.13) 


49 


The numerical results of equation (a.13) are in Tables Al 
through A6 for Data Sets 1 and 2. The graphical results are 
shown in Figures Ail through A6é. The range of the estimated 
number of faults to occur in (t,, t,+t,) 1s much smaller than 
that of the bootstrap results discussed in Chapter III. None 
of the results (estimated number of faults to occur) using the 
Bayesian method contain the observed faults. A possible 
explanation for this is inappropriate values for a and 86 
(v=620) After various projects have been analyzed with 
software reliability models, fault distribution may become 
more apparent. This information can then be incorporated to 
reliability models. I feel that, despite the surprising 
initial results, this method does promise to be a useful tool 


to the software manager. 


50 











TABLE Al 
BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 1 
C1250 Co =25 0m CPU MINUTES) 
Observed number of bugs in t, is 6 
90% Confidence Interval 


cumin [| a |  & | swe | 


5 % 0.00339 131.892 1.08 
95 % 0.00263 135.028 meee 







0.00340 eo eroT legos 4 8, 

x : 0.00264 Ho. 99 2 Pape 

=O 5 % S00 339 131.914 Og 
he we C022 155 <1 0:3 2.45 


TABLE A2 
BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 1 
t,=1000, t,=500 (CPU MINUTES) 
Observed number of bugs int, is 14 
90% Confidence Interval 


a 


se a 


OmO0 S79 LEO ee Oe a eS 3) 
OE 00S 10 21 LOD 4.54 


0.00398 124.328 204: 
Or 00S'09 Ag S823 4.58 


TABLE A3 
BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 1 
t.=500, t€,=1000 (CPU MINUTES) 
Observed number of bugs int, is 46 
90% Confidence Interval 


(CPU_min) 


5 % 0.00808 gies 608 i Ont 
Ste 0.00602 94.660 4.65 
0 U0Ba2 91 608 ae De 
ms : 
5 % 
95 3 























C7 00601 Gao 7 


C007 O17 oA 108 glare call 
0200596 94.805 4.79 


yi 














TABLE A4 
BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,=800, t,=300 (CPU SECONDS) 

Observed number of bugs int, is 12 
90% Confidence Interval 


0.00464 75.849 « al 
0.00340 WK 192 
0.00464 75.846 a aa 
0.00340 09. Z2ikG 
ae 0.00465 dS ou) 
Sous Oy 00ssr TO 242 


TABLE A5 
BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,=600, t,=500 (CPU SECONDS) 
Observed number of bugs int, is 21 
90% Confidence Interval 


4 (CPU _min) 


5 % 0.00600 eerste) = (0 1.74 
250 0.00429 1 O35 3 4.74 















SS —— 





CrOCSoe eo Eze 1.74 

ae : 0.00429 TOR Soe 4.73 

5 ¢ O2GUS%s Gomou2 Live 

95 % 0.00429 GOs Silene 4.74 
TABLE A6 


BAYESIAN ESTIMATE OF PARAMETERS FOR DATA SET 2 
t,=400, t,=700 (CPU SECONDS) 
Observed number of bugs in t, is 37 
90% Confidence Interval 


BIN (4) 
#0 5S 3 


0.00897 50.391 eae 
0.00625 53.386 | 


jl 8 Soe x 
0.00624 


1 

i: : 
5 + B00 89 50.426 1.42 
Sey 4 GO 200622 53.432 4.38 


S72 





OO9 1 


7 
| 
| 


O01 


serena yi A 
»* 


ccccg cot totcccteten geo cccccotoossaseespyiee® 
eect eterna acca sr acetsaccetages Stott atesrattstosegacoosesorseracslsserel SPL, sayae 


OSz=(0)1 ‘oszl=(S)t 


(SSLNNIN Add) AWIL 
O08 


i 
+ * 
ene 


i 


o0s=(0)!1 ‘oo0l=(S)1 


ae 


OOV 


# 


»* ne 


he 


aot 


000l=(O)1 ‘O0S=(S)L 


i 


O08 OV 
SLINVs JO YSENNN SALVINANO 


BEL 


GIS 


Bayesian Method 


(CPU minutes), 


A = 10 


Data Set l, 


Figure Al. 


53 


O09 1 


ererag! 


++ ae oooetee gg opeee® 
rere eee ee ee ere rr tree rrr eerie rr tr erirer rier rrr irre rir eri ii rerre iri irre ire rrr reer rrr ir err irri Teri rir iret tee re wet e eee eraeees tarnasarsoeeeces wert 
eee eee eee eee reer er ir ererr rir irri iii irii ier ee es 2d ke 


eqeee’ 
eos. sacgs ghbotgieg® * 
eoarasateooee: eres att 
see sen cas ereercecenat er veeserncn steerees ceeeseras reassess scgnenrasnceeeee’ vee te Ub ees 
eoeee 
cocccec cccesossocsesooes 0° 
eee ceeeeere 
mere erteerenceee seeseee 

eenerro eserves 

eereteensee® Selbeaved® 


osz=(0)1 ‘osz1=(S)1 


(SALANIN Ado) AWIL 
008 


r 
» * f 


ae 


af 
b 


60S=(O) 0001 =(S)L 


OOV 


cai 
a* i a 


" 


O001=(0)i 00S=(Sit 


OV 


Oct O08 
SLINVS 4O YFEWNAN. SALLVINWAD 


Bayesian Method 


(CPU minutes), 


A= 


Figure AZ. Date Set 1, 


54 


O09 | 


Oran! 


* 


Pal 


oSz=(0)1 ‘oszi=(s)1 


eee 
eee 
eer retry wear . f “ 
PYYTTTETIELLITILINITTTETI ETT eee : Ssstpasda anes 
wos eeoecet® 
eoetes ereccecseoses secreee 
etrocteeees  s0eee ere 
PTPTTTI RTT ee ed 


(SSLANIN NdO) AWIL 
008 


at 


00S=(0)4 ‘0001 =(S)2 


OOV 


#* 


** ut 


i 
coe ae oe 


000!=(0)1 ‘oos=(s)1 


Ot O08 OV 


See iiveleO co 


Ou 


IBAAN BALLWINANO 


=o 


Bayesian Method 


(CPU minutes), 


A= 50 


Figure A3. Data Set 1, 


SHS 


ler! 


Pa 


- 


t 


Py 


vd! ap ade 
Cecaaus es eco teterss® os sobbibaee 
«wee COC Mee OO eget eee enaetetat= SOO REEE 
eote 
cove eres? 
mee 
ws cuneeesececese "ts x 
etree 


fo 


O08 


ooc=(0)1 ‘ooe=(S)L 


coot* 


--- ——— 


OOF 


sane A 
nf 0Ov=(0)1 ‘002=(S)! 


00S=(0)1 ‘009=(S)L 


OV 


SLINVI JO YSEWNN SALVINWNAS 


O8 


Se" 


Bayesian Method 


(CPU seconds 


ae 


A= 


Dae crore ec, 


Figure A4. 


Sie 


OOZ\ 


» 


el it ete: ya 
Deancolssestesiecesievcenes=*- °c 6, 66e8 Gute 
spt ee ene ce coce eters ssrescees wetettss 
seceeeee 
AERC ECOCCOICCG IG & 
coven se cece 


(SGNOO03S NdO) ANIL 
00g OOF 


“a oe ee i eae eM a(o 


* 
Pe » 


f 


t % 


i ee ee a ee 


* 
4 


. eee _ 


ooc=(0)1 ‘o0g=(S)1 


OV 


SLIAVIS JO S3SEWNN 3SALVINNNO 


O08 


oval 


4 


Bayesian Method 


(CPU seconds), 


A = 20 


Patan sera, 


Figure A5. 


57 


OOC|I 


* 


* 


* 


z* 


* 
reretslipsnceinivae sa ssacessccscorostensrt cove cezesenssnsanugnees*? gi} * 


, oe 


(SGNO93S Nd9) AWIL 
008 


OOF 


ors 


* 


Pa 


* 


* 


f 


t 


* 
+ 


» 


OO¢=(0)1 ‘008=(S)L 


002=(0)L ‘OOr=(S)L 


eee ee ee 


OV 
oO SSSWNN SAILVINANO 


O& 


Si Fligvs 
Data Set 2Z, 


are 


Bayesian Method 


(CPU seconde 


A = 50 


Figure Aé6é. 


58 


REFERENCES 


Abdalla, Abdel-Ghaly, Chan, and Littlewood, B., "Evaluation of 
Competing Software Reliability Predictions," IEEE Transactions 
on Software Engineering, Vol.SE-12, No.9, pp.950-966, 
September 1986. 


Beizer, B., Software System Test and Quality Assurance, Van 
Nostrand Reinhold, 1984. 


Brooks, F., "No Silver Bullet," Information Processing, H.J. 
Kugler, ed., Elsevier Science Publishers, 1986. 


Dalal, S., Fowkles, E., and Hoadley, B., "Risk Analysis of the 
Space Shuttle: Pre-Challenger Prediction of Failure," Journal 
of the American Statistical Association, Vol.84, No.408, 
pp.945-958, December 1989. 


Dalal, S., Mallows, C., "When Should One Stop Testing 
Software," Journal of American Statistical Association, 
fomecs, No.403, pp.872-879, September 1988. 


Dalal, S., and Mallows, C., "Some Graphical Aids for Deciding 
When to Stop Testing," IEEE Journal, Selected Areas in 
Communications, 1990. 


Efron, B., "Bootstrap Confidence Intervals," Biometrika, 
Dome 2, NO.1, pp.45-58, April 1985. 


Farr, W., "A Survey of Software Reliability Modeling and 
Estimation," Naval Surface Warfare Center, Dahlgren Virginia, 
September 1983. 


Goel, and Okomoto, K., "Time-Dependent Error-Detection Rate 
Model for Software Reliability and Other Performance 
Measures," IEEE Transactions on Reliability, Vol.R-28, No.3, 
meee 206-211, 1979. 


Goel, jo "Software Reliability Models: Assumptions, 
Limitations and Applicability," IEEE Transactions on Software 
Engineering, Vol.SE-11, No.12, pp.1411-1423, December 1985. 


Hernandez, J., Naval Postgraduate School Masters Thesis, 
"Derivation Strategy from Experience Based Test Oracles, " 
1989. 

Hetzel, B., The Complete Guide to Software Testing, 2nd ed., 
QED Information Sciences, Inc., 1988. 


So 


IEEE Standard Glossary of Software Engineering Terms, IEEE 
Press, 1984 


Moranda, P., "Predictions on Software Reliability," )2373 
Proceedings of the Annual Reliability and Maintainability 
Symposium, Washington, D.C., 1975. 


Moranda, P., Jelenski, Z., "Final Report on Software 
Reliability Study," McDonnell Douglas Aeronautics Company, MDC 
Report Number 63921, 1972. 


Moranda, P., Jelenski, Z., Statistical Computer Performance 
Evaluation, pp.465-483, edited by Frelberger, W., Academic 
Press, 1972. 


Musa, J., Iannino, A., and Okumoto, K., Software Reliability: 
Measurement, Prediction, Application, McGraw-Hill Book 
Company, 1967. 


Musa, J., Okumoto, K., "A Logarithmic Poisson Execution Time 
Model EOr Software Reliability Measurement, " Bell 
Laboratories, Bell Laboratories Technical Report, pp.230-238, 
1984. 


Schafer, R., Alter, J., Angus, J., and Enoto, S., "Validation 
of Software Reliability Models," Rome Air Development Center 
Technical Report RADC-TR-79-147, 1979. 


Schick, G., and Wolverton, R., "An Analysis of Competing 
Software Reliability Models," IEEE Transactions on Software 
Engineering, Vol.SE-4, No.2, pp.104-120, February, 1978. 


Schneidewind, N., "Analysis of Error Process in Computer 
Software," Proceedings of the 1975 International Conference on 
Reliable Software, pp.337-346, Los Angeles, California, 1975. 


Shooman, M., "Operational Testing and Software Reliability 
Estimation During Program Development," Record of 1973 IEEE 
Symposium on Computer Software Reliability, IEEE Computer 
Seeciery Nay 1197s. 


60 


BIBLIOGRAPHY 


Allison, K. and Baca, J., "Software Maturity Processing Comes 
of Age," Defense Computing, July-August 1988. 


Caruso, J., "Integrating Prior Knowledge with a Software 
Reliability Growth Model," IBM Corporation, 1990. 


Dalal, S., and Mallows, C., "Buying with Exact Confidence," 
Bell Communications Research, 1990. 


Department of the Air Force, AFOTEC Pamphlet 800-2 Vols 1-6, 
October 1990. 


Dunn, R., Ullman, R., "Quality Assurance for Computer 
Software," McGraw-Hill Book Company, Inc., 1982. 


Farr, W., and Srivastava, V., "The Use of Software Models in 
the Analysis of Operational Software System," Institute of 
Environmental Sciences Proceedings, 1985. 


Farr, W., and Smith, O., "Statistical Modeling and Estimation 
of Reliability Functions for Software (SMERFS) Library Access 
Guide," Naval Surface Warfare Center, Dahlgren, Virginia, 
March 1991. 

Farr, W., and Smith, O., "Statistical Modeling and Estimation 


of Reliability Functions for Software (SMERFS) Users Guide," 
Naval Surface Warfare Center, Dahlgren, Virginia, March 1991. 


Humphery, W., Managing the Software Process, Addison-Wesley 
Publishing Company, 1989. 


Littlewood, B., "Theories of Software Reliability: How Good 
Are They and How Can They Be Improved," IEEE Transactions of 
Software Engineering, September 1980. 


McPherson, M. and Wiltse, J., "Software Maturity and Its Use 
aS an Operational Test Readiness Criterion," Technical 
Journal, January, 1989. 


Moranda, P., "Prediction of Software Reliability During 
Debugging," Proceedings 1975 Annual Reliability and 
Maintainability Symposium, 1975. 

Ohba, M., "Software Reliability Analysis Models," IBM Journal 
of Research and Development, July 1984. 


61 


Ray, B., Bhandari, I., and Chillarege, R., "Reliability Growth 
for Typed Defects," IBM Watson Research Center, Yorktown 
Heights, 1987. 


Siefert, D., "Implementing Software Reliability Measures," The 
NCR Journal, March 1989. 


Wilson, L., and Shen, W., "Software Reliability Perspectives, " 
unpublished researched going on at Old Dominion University, 
Norfolk, Virginia funded by NASA. 


Yamada, S., Ohba, M., and Osaki, S., "S-Shaped Reliability 
Growth Modeling for Software Error Detection," IEEE 
Transactions on Reliability, December 1983. 


Yang, M., "On Optimal Stopping Rules in Software Reliability," 
Software Engineering Research Center, 1991. 


Gz 


INITIAL DISTRIBUTION LIST 


Defense Technical Information Center 
Cameron Station 
Alexandria, Virginia 22304-6145 


Library, Code 0142 
Naval Postgraduate School 
Monterey, California 93942-5002 


Department Chairman, Code 30 
Department of Operations Research 
Naval Postgraduate School 
Monterey, California 93942-5002 


Professor Donald P. Gaver 
Department of Operations Research 
Naval Postgraduate School 
Monterey, California 93942-5002 


Professor Timothy J. Shimeall 
Computer Science Department 
Naval Postgraduate School 
Monterey, California 93942-5002 


Commander, Operational Test and Evaluation Force 
Chief of Staff 
Neprolk, Virginia 23511 


Commander, Operational Test and Evaluation Force 
Technical Director, Code OTT 
Nerkolk, Virginia 23511 


Commander, Operational Test and Evaluation Force 
Deputy Chier of Staff for OTE Support, 30 Division 
Norrolkmt, Virginia 23511 


LT Thomas E. Dennison 
1668 Toledo Court 
Pacifica, California 94044 


Headquarters 

Air Force Operational Test and Evaluation Center 
LGS Division 

Kirtland Air Force Base, New Mexico 87117-7001 


63 


eH als 

















. 
- 8 * 
7) = ° 
- a” n a @ 
a . 
8 -* « . ” . . . 
a sa #6 . 2s © . - 
. . ss . . ° * 
ee * - . . . - ‘ ° « . . ° 
- 8 2° e ° . . 
7 . - » “ oe = 
. . 2 - . 20 s . « ~ 
. . . « . . . 
os s* . ~* © 4 
Fs ° - . . . - . . . . . 
se? . - » e ‘ . s * . « oe 
. o - . . = * ' . , ‘ = . e =e ees ° e ' . 
© ~ 2 #8 . se ae . = 
. * —* . se 7% . ie 
« . ae” 8 ae ° . . . . . ° 
- - o«« = a ° ° . - . ° ° . . 
* ” - . ° . . ze (Cf ° * . « 
. . = * - - . ov * - . e oe * . e . ° - = - 3 
-* . ¢ =5 8 . s . - . Pa ° « - 
a ‘ s 2 @5* ~ * . ° . ° oe - ° . ’ - - . ,? . . 
s s* . - * = @ ° * 7° . *¢ ee . o* - ° . 
. » . #8 s* . 1. * . . 5 e * Pat) . o r 
- a) « . ¥ * « . we ® . F * Py e = = * 
= = % . . - @ . . ef . . . . e ee e e © . . 
ice = er) = « -- * ’ . « . » 1 «= a a . e ._ #* ° e vd s e 
. « * . * Po | » e e s* . . a a ~~ ° - - . " ° 
4 . _ os . . . . . et Py . se? « a . . ™ * « ° « Py . ° Py . * te 
- © * - «* . . - oe , . . ° e ~~ . * ° .@ ° 
. « - ae - » #* ors Pn id . - . ota @ pee se . 4 * * ° . 2 @ e?* ° 
oo . . « rl ° . se = » 8. wes ye ’ . . Fa se * ee . e * 2s © . . ? . ¢ . ° ee ee 2 * 
. Pa s iw a 6:8 ©. s ot, *°* ae s . . * . oo se . oe e . «°@ ~ = e » . * ae _- eo a ° ° ‘ . ° 2 @ . ° . 
e 2? ° 2? «an 2. = « ) s . . . * . a ° * © ts « e . >* « v ® ° « ee 
. ee at 8° bad ~ onan * s ao eS . s # a 8 e . =" ~e * - o & * a ° ° se id rd 1 e . . . 
. . -* - ss . 2 , wee = w aeom #8 . Py . , ° Pe ad . ° aoe * ° . ° 6 . e 8 ee « . oe e * 
- s< CT] . ° ee as" ee aotles -* = * =. s ° « - . * . ° . * ® , eo * ? “ . = . r »o o @ . 
= earF * = - oo" % - he he an . . . ° oe @ e Scie © o . . ° . e e « oe a s* 
. Za.) 2 « “~- * 2 * ee * s* . Sei Jee . ¢ ¢ ome . ~ . @ 7 a * e ° ° = * e aay ee Pe ee ee ° . ° ° 
. = 2 ~ . . a) =: . . ° « ‘ . es . or? er ~ . . e bg « « « - * oe * ° 
«2 8 ._ « #® . ° - - - © ‘ “ we . © ow ° ' ° « ~# = . cr a ° . ° ee rd 
Pp bbe (etal os 5 oe . . - . ~ . aes - * oe ¢ ~ ° ~oee * -¢ *. ° Py . * « ee -* . » a e ~ e ° * - 
weowepear ee ~? «ee ae * =« wee ght * Pay pe . a iat aN "7 . . , e ° on =. bed ” * atagn * a « * e * 
i aa mae e * » =F * 8 Py - o* ~"* aeor a ee . © ry? ea * - « aoe oe Pr . . ee ‘ © ’ “i 7 - 2 «ef 
Perit a coe ents? os wn = o - Te anee s ? or os * . «? a . a .* a ° « e ~ ° . ~ * ° ae ° * ee 
a ag Fag ee oe » #8 eae os ne me me =e" - . © eoewe? . © ° ¢ « on * 2e . - ° s . e 2 . oe e . ° .? . s : © * 7°? 
ore a saw e- -* . . © . i « w ° . , we ee jt » ° . a ” 8 ] a ef ee .? ra Pa) 2 2 
« « . oer in AL Ae Pe Pe See . - @ 1 . e « « a- = ® , . . one Be a e ° Py * . ° ° 2 
oar * . . * @ sc oe = * e > - . = Feet om ar * . #? 2s «= 2 - » Pe * «es ° ° 77 * 
- . a . 3° “ - * my - » 0 = =o * Cd ~ ae 4 = Py ee . . ° « - ry ° ° s “ * 
e ~ e* * . . ae -~ a . . - x » - . « . . . oo ae -a v . . . € “ ta o oe * . eo . oe ° 
ae aasee enmre? . . . ° gee ” - ° a oe rd * “* a 2 « ° 
=¢ Ca = . * « - “ : = 7 ° oe mde iy ee? ee e oe a Ss ° ° 
> # ad . 8 oe ro = ee ‘a ° we ra ’ « = ea . . ° . #e ° 
© «. ~ - - * ° «~~ - . 1 . aie a. Ske ° 2? we ° e ° e ° 
_ - ot“ = * - - . - oot at * ose an ° ov" se enew® pee e . > 
oe e “ e . ete Cd - Cad ee * ¢ ee e s . oe 
’ ef * 2 ° e ” yw fon “ = & * ane ee * oe ° *. s* © - 
° ~ . . .. ae oe 2 © . . «* * = eee +, - 6 t= ri 
“ « ew a . 2 « . @ oo e Pe) ~ * Pt ee Dt oe . * 
. on on . ie ue nS Vet - o eon eee ¢ * sou e - a ° 
« © eae nd ° oa Pid . * - ed Pi ed - 
- acw ¢ ae wart? *\ .* a ” oae® = . ° 
ve se Pa ao of se ° e ra <n ome Ie - - 
nee . « of os » > 28 ao 24% vole ° * es 
* - Cd fa . es ** °-. e 
td e ef oe on te - ©. of ° o - . “s - 
vores = (6 ere os ree . . 5 ’ 2 . 
. osaene? » - ° a ea 8 
« --¢ wie ne. 
ro © “ se* 
» 8 


2 P 
. - * 
2 
we 
° 
ear : 
© ** -_ 
oa ~ 
? 
= : 
OZ re 


oo &, 
oe ayes ate 


a? 


et a 

owe" 

oo tefts® 
. eee * 


© one *~ 
sar 
a seo? 5 é > 
ser GE ® fe . © x ‘ “a 
co walls ot oe d - dd . 2 Oe 
. weer © ie py 
‘ y °. 2 rue @, Mee “et * 
a . “.° . enue? 
z « - . so = 
. om oe : wees & 
9 7 2 A Agel oy segs GOR 
f ® aengt Sauer © rapte oe eee 
“ ° 4 oe * a now ee ee gp ave oe be. 
. c . . eet ee Free . 
et on 
meee &* 


o@ tee 8 & pause 
~ «= Awe 


 esege re am aae OPP mT o on antes 
wt & wpe too sagexe? is oom © erste 7, © 
an 2 Te= « " - oa we we a= 
a re @e 


| 
| 


} 


re bela 
eke se ooees vem 
ne 


oo 
OTT a hg é 
ee Cia te 
a. * = v 2 « 
a =p ye 


t 
| 
| 


| 
HI 


mes 
ensen™ seve Le 
ae pert: ae 
are pet Gu Om mye EO -~ 
wre On: 1e ¢ 
pare) 
ra arpnter@ fore" oe 
az ae ag, C8 om. 
e* Pr dad 
ee tl 
ee © 


| 
' 


| 


eo aeer we , oe 
om tomy OVE <* 
2 pmaye of 
~ * € 
‘ geguets ont 
ec yews eer ~. ee 
cm ageat. *nee"* 


DUDLE 
Y pines LIBRARY 
| 


188 
tf Ae wee 
same * ate 


awe eeuP 
Cee asPenter Ore e ower 
AF we Met eee mae Or 
eareets © Py Mn bad Lae 
«= Neh tte ao 6 on HF 
arg Boe > sal 
sa eecons e Ase 7G 
ee 
sam po 


wen re ore 
Of. * 


rye eee 
par One PT ed fal 
ema Peeled “ 
dled rad yt ¥ Pr en 
. PP te aki ¢ 
aa tsd 197 pwnd # 

el ee a ehded * 

— 3 pocege gee 

pees * 


co 
ee we raga EAE 
we 


oS al 
ote ae ee = meereu"™ 
gS: Terese N 
owe 
Fan 


ieee al dare Gee 
pipe oe wn or 


~~ 
pane Pae AIA AS 
peg lee yam “ 
Pameeyee ve 
« 


errr er a nate 
geet Cage Getta? payee 
gg 


ereraes | eer Laer 
Od i et a pea a aE 
pe ce eee wee wre 
fae ener Sito 
a ee camsea bere h EOCE Se, 
Farce ar vahe abe iwt® eye’ 
1 bitiet 





